Supervised/Unsupervised Voice Activity Detectors for Text- dependent Speaker Recognition on the RSR2015 Corpus
نویسندگان
چکیده
Voice activity detection, i.e., discrimination of the speech/nonspeech segments in a speech signal, is an important enabling technology for a variety of speech-based applications including the speaker recognition. In this work we provide a performance evaluation of the following supervised and unsupervised VAD algorithms in the context of text-dependent speaker recognition on the RSR2015 (Robust Speaker Recognition 2015) task : Energy-based VAD with and without hangover scheme and endpoint detection, vector quantizationbased VAD, Gaussian mixtures model (GMM)-based VAD (both supervised and unsupervised way), and sequential GMM-based VAD. Experimental results show that both the supervised and unsupervised GMM-based VADs perform better than the other VAD algorithms. Considering all three evaluation metrics (equal error rate, old (SRE 2008) and new (SRE 2010) normalized detection cost functions) unsupervised GMM-based VAD performed the best.
منابع مشابه
Extended RSR2015 for text-dependent speaker verification over VHF channel
Text-dependent speaker verification over degraded radio channel is a challenging task. To better understand the research problem, the Institute for Infocomm Research (I2R) of Singapore has collected a corpus of voice recordings transmitted over marine VHF. Built as an extension of the RSR2015 database, the VHF-RSR2015 consists of recordings from 300 speakers of Part I of the RSR2015 database tr...
متن کاملThe RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases
This paper describes a new speech corpus, the RSR2015 database designed for text-dependent speaker recognition with scenario based on fixed pass-phrases. This database consists of over 71 hours of speech recorded from English speakers covering the diversity of accents spoken in Singapore. Acquisition has been done using a set of six portable devices including smart phones and tablets. The pool ...
متن کاملRSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases
This paper describes a new speech corpus, the RSR2015 database designed for text-dependent speaker recognition with scenario based on fixed pass-phrases. This database consists of over 71 hours of speech recorded from English speakers covering the diversity of accents spoken in Singapore. Acquisition has been done using a set of six portable devices including smart phones and tablets. The pool ...
متن کاملVulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints
Voice conversion, a technique to change one’s voice to sound like that of another, poses a threat to even high performance speaker verification system. Vulnerability of text-independent speaker verification systems under spoofing attack, using statistical voice conversion technique, was evaluated and confirmed in our previous work. In this paper, we further extend the study to text-dependent sp...
متن کاملSinging speaker clustering based on subspace learning in the GMM mean supervector space
In this study, we propose algorithms based on subspace learning in the GMM mean supervector space to improve performance of speaker clustering with speech from both reading and singing. As a speaking style, singing introduces changes in the time-frequency structure of a speaker’s voice. The purpose of this study is to introduce advancements for speech systems such as speech indexing and retriev...
متن کامل